Uncorrelated Lasso
نویسندگان
چکیده
Lasso-type variable selection has increasingly expanded its machine learning applications. In this paper, uncorrelated Lasso is proposed for variable selection, where variable de-correlation is considered simultaneously with variable selection, so that selected variables are uncorrelated as much as possible. An effective iterative algorithm, with the proof of convergence, is presented to solve the sparse optimization problem. Experiments on benchmark data sets show that the proposed method has better classification performance than many state-of-the-art variable selection methods. In many regression applications, there are too many unrelated predictors which may hide the relationship between response and the most related predictors. A common way to resolve this problem is variable selection, that is to select a subset of the most representative or discriminative predictors from the input predictor set. The central requirement is that good predictor set contains predictors that are highly correlated with the response, but uncorrelated with each other. Various kinds of variable selection methods have been developed to tackle the issue of high dimensionality. The main challenge is to select a set of predictors, as small as possible, that help the classifier to accurately classify the learning examples. The major type of variable selection methods (filter-type) is independent of classifiers, such as: t-test, F-statistic (Ding and Peng 2005), ReliefF (Kononenko 1994), mRMR (Peng, Long, and Ding 2005), and information gain/mutual information (Raileanu and Stoffel 2004). Another wrapper-type of variable selection methods take classifier as a black box to evaluate subsets of predictors (Kohavi and John 1997). There also is method of stochastic search for variable selection based on generalized singular g-prior (gsg-SSVS) (Yang and Song 2010). Recently, sparsity regularization receives increasing attention in variable selection studies. The well-known Lasso (Least Absolute Shrinkage and Selection Operator) is a penalized least squares method with l1-regularization, which is used to shrink/suppress variables to achieve the goal of variable selection (Tibshirani 1996). Owing to the nature of the Copyright c © 2013, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. l1-norm penalty, the Lasso does both continuous shrinkage and automatic variable selection simultaneously. As variable selection becomes increasingly important in modern data analysis, the Lasso is much more appealing for its sparse representation. Elastic Net (Zou and Hastie 2005) added l2regularization in Lasso to make the regression coefficients more stable. Group Lasso (Yuan and Lin 2006) was proposed where the covariates are assumed to be clustered in groups, and the sum of Euclidean norms of the loadings in each group is utilized. Supervised Group Lasso (SGLasso) (Ma, Song, and Huang 2007) performed K-means clustering before Group Lasso. In this paper, motivated by the previous sparse learning based research, we propose to add variable correlation into the sparse-learning-based variable selection approach. We note that in previous Lasso-type variable selection, variable correlations are not taken into account, while in most real-life data, predictors are often correlated. Strongly correlated predictors share similar properties, and have some overlapped information. In some cases, especially when the number of selected predictors is very limited, more information needs to be contained in the selected predictors, where strongly correlated predictors should not be in the model together. Only one predictor is selected out of the strongly correlated predictors, so that limited selected predictors will contain more information.Therefore we need to take into account the variable correlation in variable selection. To our knowledge, existing Lasso-type of variable selection methods have not considered variable correlation. In the following, we firstly briefly review the normal Lasso and Elastic Net, then present our formulation of uncorrelated Lasso-type variable selection. An effective iterative algorithm, with its proof of convergence, is presented to solve the sparse optimization problem. Experiments on two benchmark gene data sets are performed to evaluate the algorithm. The paper concludes in the last section. Brief review of Lasso and Elastic Net Let there be a set of training data {(xi, yi), i = 1, 2, · · · , n}, where xi = (x1i, x2i, · · · , xpi) ∈ R is a vector of predictors and yi ∈ R is its corresponding response. Formulate them in matrix form X = [x1,x2, · · · ,xn] ∈ Rp×n and y = (y1, y2, · · · , yn) ∈ R, then the Lasso (Tibshirani 1996) is a linear regression problem between predictors and Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence
منابع مشابه
Exclusive Feature Learning on Arbitrary Structures via `1,2-norm
Group LASSO is widely used to enforce the structural sparsity, which achieves the sparsity at the inter-group level. In this paper, we propose a new formulation called “exclusive group LASSO”, which brings out sparsity at intra-group level in the context of feature selection. The proposed exclusive group LASSO is applicable on any feature structures, regardless of their overlapping or non-overl...
متن کاملExclusive Feature Learning on Arbitrary Structures via \ell_{1, 2}-norm
Group LASSO is widely used to enforce the structural sparsity, which achieves the sparsity at the inter-group level. In this paper, we propose a new formulation called “exclusive group LASSO”, which brings out sparsity at intra-group level in the context of feature selection. The proposed exclusive group LASSO is applicable on any feature structures, regardless of their overlapping or non-overl...
متن کاملStandardization and the Group Lasso Penalty.
We re-examine the original Group Lasso paper of Yuan and Lin (2007). The form of penalty in that paper seems to be designed for problems with uncorrelated features, but the statistical community has adopted it for general problems with correlated features. We show that for this general situation, a Group Lasso with a different choice of penalty matrix is generally more effective. We give insigh...
متن کاملUncorrelated Group LASSO
`2,1-norm is an effective regularization to enforce a simple group sparsity for feature learning. To capture some subtle structures among feature groups, we propose a new regularization called exclusive group `2,1-norm. It enforces the sparsity at the intra-group level by using `2,1-norm, while encourages the selected features to distribute in different groups by using `2 norm at the inter-grou...
متن کاملSupport Union Recovery in High - Dimensional Multivariate Regression
In multivariate regression, a K-dimensional response vector is regressed upon a common set of p covariates, with a matrix B∗ ∈ Rp×K of regression coefficients. We study the behavior of the multivariate group Lasso, in which block regularization based on the `1/`2 norm is used for support union recovery, or recovery of the set of s rows for which B∗ is non-zero. Under high-dimensional scaling, w...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013